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Abstract 

In finite mixtures of location-scale distributions, if there is no con- 
straint on the parameters then the maximum likelihood estimate does 
not exist. But when the ratios of the scale parameters are restricted ap- 
propriately, the maximum likelihood estimate exists. We prove that the 
maximum likelihood estimator (MLE) is strongly consistent, if the ratios 
of the scale parameters are restricted from below by exp(— n d ), < d < 1, 
where n is the sample size. 

Key words and phrases: Mixture distribution, maximum likelihood estimator, 
consistency. 

1 Introduction 

In this paper we consider mixtures of M location-scale densities which is defined 

by 

M 
m— 1 

where a m , called the mixing weights, are nonnegative real numbers that sum to 
one and f m (x; fi m , o~ m ), called the components of the mixture, are location-scale 
density functions with the location parameter fj, m G R and the scale parameter 
a m > 0. 6 contains all the parameters in the mixture and can be written as 
8 = (ax, fix, ax, . . . , oim, A*Mj ctm)- 
The location-scale densities satisfy 

fmip^'t Mm; 0~m) — fm ( 3 0i 1 

0~m. \ 0~rr>. 
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We allow f m (x; \x mi a m ) to belong to different families. For example, fx(x; fix, <j\) 
may be a normal density, f2(x; /i2, 02) may be a uniform density, etc. 

In finite mixtures of location-scale distributions, the maximum likelihood 
estimate over the whole parameter space does not exist. For a given data 
xi, . . . , x n , the likelihood function for a mixture is unbounded. For example, if 
we set fi± = x\ and let <j\ — > in a mixture of two normal densities with means 
fix, /12 and standard deviations ax, 02, then the likelihood will tend to infinity. 

Let us consider the constrained parameter space 

9 b = {6 e 9 I min a m /a m > > 6} , < b < 1, 

X<m,m'<M 



/here denote the unconstrained parameter space. I Hathaway (1985) showed 



wi 

that the global maximizer 9b of the likelihood function over Of, exists and if true 
value of parameter belongs to O& then 9b is strongly consistent. But there is 
the problem how small we have to choose b to ensure strong consistency. An 
interesting question here is whether we can decrease the bound b to zero with 
the sample size yet gurantee the strong consistency of th e maximum likelihood 



estima tor. This question is mentiond in lHathawavl (|1985f ). lMcLachlan and Peel 
(2000) and treat ed as an unsolved problem. 



Meanwhile in Tanaka and Takemura (200G) , we consider mixtures of location- 



scale distributions with constraint imposed on the scale parameters themselves 
and showed that the maximum likelihood estimator is strongly consistent if 

the scale parameters are re stricted from below by exp(n _d ), < d < 1. 

iTanaka and Takemural (|2005l ) implies that the rate exp(n _rf ) obtained in Tanaka and Takemural 



(200(1) is almost the lower bound to ensure strong consistency. The method 
used in Tanaka and Takemural (|2006h is useful for solving the problem stated in 



lHathawavl p985) in which the constraints are imposed on the ratios of the scale 
parameters. 

In this paper, we solve the problem stated in lHathawavl (|l985t ). We prove 



that the maximum likelihood estimator is strongly consistent, if the ratios of 
the scale parameters are restricted from below by exp(— n d ), < d < 1. 

The organization of the paper is as follows. In section [5] we prepare notation 
and summarize some preliminary results. In section [3] we state our main result 
in theorem [2j Section [4] is devoted to the proof of theorem [2] The last section 
is conclusion and future work. 



2 Notation and definitions 
2.1 Notation 

Let Q m = M x (0, 00) denote the parameter space of the m-th component 
(/■*mj fm)' Then the entire parameter space O can be represented as follows. 

M M 

9 = {(ai, . . . , a M ) G K M I a m = 1 , a n > 0} x [J il m . 

771=1 771=1 
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Let be a subset of {1, 2, ... , M} and let denote the number of ele- 
ments in Denote by Q,x a subvector of 9 £ O consisting of the components 
in JgT. Then the parameter space of subprobability measures consisting of the 
components in is 

e^t = {ffjr I 9 € Q, «m < !}• 
Corresponding density and the set of subprobability densities are denoted by 

Furthermore denote the set of subprobability densities with no more than K 
components by 

Sfc = |J Sfje- (1<X<M). 
\j(r\<K 

2.2 Identifiability and strong consistency of estimators 

In general, a parametric family of distributions is identifiable if different values 
of parameter designate different distributions. In mixtures of distributions, dif- 
ferent parameters may designate the same distribution. For example, if a± = 0, 
then for all parameters which differ only in fix or a±, we have the same distri- 
bution. Thus mixtures of distributions are not identifiable. Therefore we have 
to carefully define strong consiste ncy of an estimator. The following definition 
is essentially the same as iRedner ( 
6 is a subset of Euclidean space and dist(0, 6') denotes the Euclidean distance 
between 9, 9' G O. Furthermore we define 

d38t(U,V) = inf inf dist(0,0') 
9eU9'ev 

for U,V C 6. 

Definition 1. (a strongly consistent estimator) 
Let 0o denote the set of true parameters 

Q Q = {9eQ\f(x;9)=f(x;9 ) a.e.x}, 
where 9q is one of parameters designating the true distribution, and let 

e(9) = {9e&\f(x;9)=f(x;9) a.e.x}. 
An estimator 9 n is strongly consistent if 

Prob ( lim dist(9(0„), 6 ) = o) = 1. 



1981). We assume that the parameter space 
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2.3 Preliminaries 



In Tanaka and Takemura (l2006h . we assume the following regularity conditions 



for strong consistency of the constrained maximum likclihoood estimator. 
Assumption 1. There exist real constants v$,v\ > and (3 > 1 such that 
fm(x;Hm = 0, a m = 1) < min{w , «i ■ M _/3 } 

for all m. 

Let r denote any compact subset of 0. 
Assumption 2. For 9 £ and any positive real number p, let 
f(x;9,p) = sup 

dist(0\0)<p 

For eac/i G T and sufficiently small p, f(x;9,p) is measurable. 

Assumption 3. For eac/i 9 GT, if lim,--^ 9^ = 9, (9® e V) then linx,^ f(x; 9^) 
f(x; 9) except on a set which is a null set and does not depend on the sequence 
\<> l IT :• 

Assumption 4. 

|log /(x; 9 ) | f(x; 9 )dx < oo. 



Le t Eq[-] denote the expectation under the true parameter 9$. In lTanaka and Takemura 
(2006), we showed the following theorem. 



Theorem 1. (Ta naka and Takemura (|2006h ) Suppose that assumptions\]}n\are 



satisfied and f(x;9o) S ?m\^m-i- Let cq > and < d < 1. If c n = 
Co • exp(— n ) and 

©c„ = {# S B | min cr m > c„}, 

l<rn<M 

then 

Prob f lim dist(0(0 c J, O ) = o) = 1 , 

\?i — >oo / 

where 9 C . is the maximum likelihood estimator restricted to r 



3 Main result 

To show the strong consistenc y of the constrain ed maximum likelihoood esti- 
mator in the problem stated in Hathaway ( 1985), we replace the assumption [TJ 
with the following assumption. 
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Assumption 5. There exist real constants v$,v\ > and (3 > 2 such that 
f m (x;Hm = 0, a m = 1) < min{v , «i ■ M _/3 } 

/or aZZ m. 

Now we state the main theorem of this paper. 

Theorem 2. Suppose that the assumptions fj[{5| are satisfied and f(x; 9o) S 
!ImWm-i- ief 6o > and < d < 1. Ifb n = bo- exp(-n d ) and 



e 6 „ = {9 e e | mm — > &„}, 

l<m^ro'<M a m ' 



then 



Prob f lim dist(0(0 b „), O ) = o) = 1 , 
where 6b n is the maximum likelihood estimator restricted to Ob„ ■ O 

4 Proof 

In this section, we prove theorem [2] by using theorem [T] The organization of 
this section is as follows. First in subsection 14.11 we partition the parameter 
space Ot> n into two sets. Then the proof for strong consistency of the maximum 
likelihood estimator restricted to 0^ is also partitioned and the proof for one 
set is shown by applying the result of theorem [TJ The proof for another set is 
shown in section [ 



4.1 Partitioning the parameter space 

Let < d < 1 be any constant and define b n = exp(— n d ). We choose d' 
such that d < d' < 1 and define c„ = exp(— n d ). Notice that the following 
arguments also hold even when we define b n = &o • exp(— n d ) with a positive 
constant b . Define 0f, n = {9 e | mini< m7 < m / <m 7^7 > b n } and Cn = {9 € 
I mini< m <A/ o m > c n }. The constrained parmeter space 0^ can be divided 
into two sets. 

0&„ = (0t„ n 0c J u (0 6 „ n e^), 

where 0?" is the complement of Cll . 

From theorem[l] the maximum likelihood estimator over 0^ n0 Crl is strongly 
consistent. If the maximum of likelihood function over Qt n H 0^ is very small, 
then the maximu m likel i hood estimator over ©6 ra is strongly consisitent. By the 
argument used in Wald (|l949h . in order to prove theorem [2] it suffices to prove 
the following lemma. 

Lemma 1. 

su Peee 6 „nec U7=if( x ^ 9 ) 
hm = — j- — = 0, a.e. (1 
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4.2 Proof of lemma [T] 



Let 



= mm a„ 



Km<M 



(J(M) = max 

v ; Km<M 



Then for 9 G 0£, 
Furthermore for S 0^ , 



< c„. 



(2) 
(3) 



— > &« , 1 < m < M, 



Therefore for 9 E Q bn n 6f 



CT m < c n /&n = exp(n d - n d ) 



1 < m < M. 



(4) 



This means that all the scale paramters a m of 9 € 6(, n n 0^ are very small for 
large n. Hence the maximum of likelihood function over 0& n n 0|? seems to be 
small. 

Let Eq[-] denote the expectation under the true parameter 9o- By law of 
large numbers {T]) is implied by 



1 - 

limsup- sup >^ log f(xj] 0) < E [log f(x; 6> )] a.e. 

Therefore, in order to prove Q] it suffices to prove ([5]). 

4.2.1 Step 1 : Bounding the components by step functions 



(5) 




X 



fji m - u(a m y Vm + v{<j m ) 

Figure 1: Each component is bounded by a step function. 



Define 



l — - 



(6) 
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From assumption [5] each component is bounded from above as 

f m {x;Hm,<J m ) < max{l[ Mm _ I/ ( . m ) i Alm+I/(o . m )](x) • — , v <r m }. 

See figure [1] From this and © we obtain the following lemma. 
Lemma 2. 

f m {x;n m ,<J m ) < max{lr (Llm _ ! ,( CTm ) i ^ m + v {a n )]{x) , v (T(m)} , 1 < m < M. 

CT(1) 

4.2.2 Step 2 : Bounding the likelihood function by two terms 

Let R n (V) denote the number of observation which belong to a set V C K. 
Define 

M 

J (fi)= U ~ Mm + v(a m )]. 

m— 1 

Lemma 3. 

n 

V0ee 6 „ne? , Viog/^) < R n {j{6)y\og^+R n {j{e) c y\og{v Q a (M) ) 

(7) 
□ 

Proof: From lemma O we obtain 

n n ( M ~\ 



f^m i @m ) ( 

i—1 i—1 K. m — 1 ) 

< < max log/ m (x^;^ m ,cT m ) > 

n 

771 )•••) ^fl) 



= i^(J(0)) • log — + R n (J(6f ) ■ log (v <t (M )). 

0(1) 



In the following we bound the right hand side of from above. 
4.2.3 Step 3 : Bounding the first term 

Let Xi, . . . , x n denote a random sample of size n from f(x; 9o) and let 
x ntl =min{xx,...,x n } , x n>n =max{xx,...,x n }. 



□ 



In lTanaka and Takemura (2006), we showed the following lemma. 



Lemma 4. ( Tanaka and Takem ura For any real positive constants Aq > 

0, £ > 0, define 



Then 



A n = A ■ n^r. 



Prob (x n ,i < -A n or x n , n > A r , 



(8) 



0. 



By this lemma we can bound the behavior of the minimum and the maximum 
of the sample with probability 1. In the following we ignore the event {x n ^\ < 
—A n or x n ^ n > ^4„}. 

Next we prove the following lemma for bounding the first term of ([7|). 

Lemma 5. 

V0 G 6 h „ n 9^ , Ve > , Prob (max{i?„( J{9)) - 4M , 0} > e i.o.) = 
Proof: Define 



h(w n ) 



2lV r 



Ik 



{■wn) 



A, 



x 



2tVr 



Figure 2: Division of interval [— A n , A n ] by short interval of length 2w„ 



Wn = v(Cnjb n ). (9) 

From lemma 21 we can ignore the event {x n ,\ < —A n or x n _ n > A n }. Now we 
divide [— A n , A n ] from — A n to A n by short intervals of length 2w n - Let k(w n ) be 
the number of the short intervals and let Ii(w n ), . . . , Ik(w n ) i w n) be the divided 
short intervals. The length of the rightmost short interval If.( w \(w n ) may be 
less than 2w n . See figure [2j Then we have 

feK)<^ + i = — + i. (io) 

2w n W n 

From (gj), © and © we have 

v(ai), v{<J2), • • • , v(o M ) < v(c n /b n ) = w n - 

Since J(6) — Um=i[/ X m ~ K°>n); fi m + v(o- m )] consists of M intervals of length 
at most 2w ni J (8) H [— A n ,A„] is covered by at most 2M short intervals of 
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Ii(w n ), ■ ■ ■ , Ik(w„) (w n)- Therefore the following relation holds. 

{max{i?„(J((9)) - AM , 0} > e} O {max{i?„( J{9) n [-A n ,A n ]) - 4M , 0} > e} 

=> {1 < 3k < k(w n ) , R n (I k {w n )) > 2}. 

Let u = sup x f(x;8o)- Let PolX) denote the probability of V C R under the 
true density 

P (V) = f f(x;d )dx. 
Jv 

From (Unj), R n (I k (w n j) ~ Bm(n, P (I k {w n ))) and P {I k {w n )) < 2w n u we ob- 
tain 

k(w n ) 

Prob (max{i?„ (J(<9)) - 4M , 0} > e) < ^ Prob (R n (I k (w n )) > 2) 
< • < max Prob(i?„(7fc(w„)) > 2) 

I l<fe_fe(l(Jn) 



\n — k 



< 



w n I ^ \k 



< ( — - + 1 ) (2nw n uo) 2 exp (2nw n uo) . (11) 



From ([6]) , ([8]) and Q , the order of the right hand side of (fTTj) is evaluated as 
follws. 



_-i + 1 W W „u ) 2 = 0(n 2+ ^ . e (n d ^' K i-2/p) 

Wn V 



A n 

W n J 

exp (2nw n uo) ~ 0(1) 



Recall that < d < d! < 1 and (3 > 2 by assumption [51 Then 

oo 

^ n 2 + f±i. e („^')(l_ 2//3 ) <oo _ 

71=1 

Therefore, when we sum the right hand side of (jlip over n, the resulting series 
converges. Hence by Borel-Cantelli lemma, we have 

Prob (max{Rn(J(e)) - AM , 0} > e i.o.) = 0. 

□ 



9 



4.2.4 Step 4 : Evaluating the likelihood function 

From lemma [5l we can ignore the event R n (J(9)) > AM. In the following we 
consider only the event {R n {J{9)) < AM}. Note that R n (J(9) c ) > n - AM. 
For sufficiently large n, the scale parameters <7i, . . . , om of # G <d bn n 9^ are 
very small so that we can bound the right hand side of (J7J) as follows. 

R n (J(9)) ■ log + i?„(J(0) c ) • log (v a (M) ) 

<T(1) 

Vf] 

< 4M ■ log — - + (n - 4M) • log (v a ( M ) ) 

^(1) 

< n ■ log w + (n - 8M) ■ log <t (1) + (n - AM) ■ log ^- a.(eL2) 

where the last inequality holds by ([J) i.e. aVj^n < &(i)/bn- Recall that we set 
< d < d' < 1. Then from ©, Q and (JT2J) we have 

1 ™ 

sup - Vlog/(xi;0) 

< - |n ■ logw + (n - 8M) • + (n - 4M) • n d | -> -oo a.e. 

Therefore we obtain ([5]) and lemma [1] is proved. 
This completes the proof of theorem [2] 



5 Conclusion 

In this paper we prove that if we set b n = exp(— n d ), < d < 1, then the max- 
imum likelihood estimator restricted to Ot,^ = {9 G <d mini< m7 s TO /<M ^7 > 
&„} is strongly consistent under very mild regularity conditions. Mixtures of nor- 
mal distr ibutions satisfy th e regularity conditions. This means that the problem 
stated in lHathawav ( 1985f ) is solved. 



If we define b n = exp(— n r ), r > 1, and set 9 as 

fii = x\ , u\ = cxp(— n r ) 

Mm = , a m = 1 (to ^ 1), 

then 9 G 9b„ and the mean log likelihood of this density tends to infinity 
(jTanaka and Takemural (|2005h ). This means that the mean log likelihood of the 



true model which converges to finite value almost everywhere is dominated by 
that of other models. Therefore if b n decreases to zero faster than exp(— n), 
then the consistency of the maximum likelihood estimator fails. This implies 
that the rate of b n = exp(— n d ), < d < 1 obtained in this paper is almost the 
lower bound of the order of b n which maintains the consistency. 

In theorcm[2]we assume f(x; 9q) G ^m\^m-i i-e. the number of components 
of true model is known. To discuss the case that the number of components of 
true model is unknown, more complicated mathematical techniques are needed. 
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